496: feat: Replace from_avro_datum* functions with GenericDatumReader#61
496: feat: Replace from_avro_datum* functions with GenericDatumReader#61martin-augment wants to merge 1 commit intomainfrom
from_avro_datum* functions with GenericDatumReader#61Conversation
WalkthroughThis pull request refactors the Avro datum reading API by introducing a new ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Avro datum reading mechanism by introducing a new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } else { | ||
| Ok(value) | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing enclosing namespace in reader schema resolution
High Severity
read_value passes None as the enclosing_namespace argument to resolve_internal, but the old from_avro_datum code called value.resolve(schema) which internally calls resolve_schemata, which passes schema.namespace() as the enclosing namespace. This means schemas with namespaces that contain Schema::Ref entries relying on the enclosing namespace for fully-qualified name resolution will fail to resolve correctly. The third argument in the value.resolve_internal(reader, resolved.get_names(), None, &None) call needs to be reader.namespace() instead of None.
There was a problem hiding this comment.
value:incorrect-but-reasonable; category:bug; feedback: The Bugbot AI reviewer is not correct! It does not matter whether the namespace for the top-level Schema will be None or schema.namespace(). The implementation falls back to schema.namespace() if the enclosing is None. So, it behaves the same.
🤖 Augment PR SummarySummary: This PR replaces the old standalone Changes:
Technical Notes: Decoding now consistently goes through 🤖 Was this summary useful? React with 👍 or 👎 |
| pub fn read_value<R: Read>(&self, reader: &mut R) -> AvroResult<Value> { | ||
| let value = decode_internal(self.writer, self.resolved.get_names(), None, reader)?; | ||
| if let Some((reader, resolved)) = &self.reader { | ||
| value.resolve_internal(reader, resolved.get_names(), None, &None) |
There was a problem hiding this comment.
GenericDatumReader::read_value calls value.resolve_internal(..., None, &None), but Value::resolve(_schemata) passes schema.namespace() as enclosing_namespace. This difference could break resolution for unqualified Schema::Ref names in the reader schema (especially for programmatically-constructed schemas).
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:incorrect-but-reasonable; category:bug; feedback: The Augment AI reviewer is not correct! It does not matter whether the namespace for the top-level Schema will be None or schema.namespace(). The implementation falls back to schema.namespace() if the enclosing is None. So, it behaves the same.
| /// **NOTE** This function has a quite small niche of usage and does NOT take care of reading the | ||
| /// header and consecutive data blocks; use [`Reader`](struct.Reader.html) if you don't know what | ||
| /// you are doing, instead. | ||
| #[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")] |
There was a problem hiding this comment.
The deprecation notes (and the doc comment above) point users to DatumReader, but there doesn’t appear to be any DatumReader type/trait exported in this crate; this may confuse users trying to follow the migration guidance.
Severity: low
Other Locations
avro/src/reader/datum.rs:37avro/src/reader/datum.rs:176avro/src/reader/datum.rs:208
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.
|
PR 61 Code Review - see details below |
|
Bug: In read_value (datum.rs:122), resolve_internal is called with None as enclosing_namespace. The old from_avro_datum used value.resolve(schema) which passes schema.namespace() as enclosing_namespace. For schemas with a namespace, this change could break reference resolution. Suggested fix: pass reader_schema.namespace() instead of None. Warning: All 3 deprecated functions say note=Use DatumReader instead but the type is GenericDatumReader. Minor: datum.rs:119 Read a Avro -> Read an Avro. Positives: consistent API with GenericDatumWriter, ResolvedSchema pre-computed at build time, good use of bon type-state pattern for reader_schemata ordering. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/reader/datum.rs`:
- Line 37: Update doc comments and deprecation notes to reference the real
public type GenericDatumReader and use intra-doc links: replace the reference
"DatumReader" in the doc comment that builds a DatumReader with
"GenericDatumReader" (symbol: GenericDatumReader), change the brittle HTML link
"struct.Reader.html" to an intra-doc link "crate::Reader" (symbol:
crate::Reader), and update all deprecation notes that say "Use `DatumReader`
instead" to "Use `GenericDatumReader` instead" (symbols: deprecation blocks near
the original lines referencing DatumReader/GenericDatumReader) so docs point to
the correct type and use stable intra-doc syntax.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 60a47493-3968-4fa4-8c60-a13c1a7fe460
📒 Files selected for processing (12)
avro/src/lib.rsavro/src/reader/datum.rsavro/src/reader/mod.rsavro/src/schema/mod.rsavro/src/schema/resolve.rsavro/src/serde/de.rsavro/src/writer/datum.rsavro/tests/avro-3786.rsavro/tests/avro-3787.rsavro/tests/io.rsavro/tests/schema.rsavro/tests/to_from_avro_datum_schemata.rs
|
|
||
| #[bon] | ||
| impl<'s> GenericDatumReader<'s> { | ||
| /// Build a [`DatumReader`]. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Verify whether a DatumReader symbol exists in the crate.
rg -nP '\b(pub\s+type\s+DatumReader|struct\s+DatumReader|enum\s+DatumReader|trait\s+DatumReader)\b' avro/src || true
# 2) Locate all references that may need update.
rg -n 'DatumReader|struct\.Reader\.html' avro/src/reader/datum.rs avro/src/lib.rsRepository: martin-augment/avro-rs
Length of output: 2041
🏁 Script executed:
#!/bin/bash
# Read the file and show context around the mentioned lines
sed -n '30,50p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '140,160p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '170,185p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '200,215p' avro/src/reader/datum.rs | cat -nRepository: martin-augment/avro-rs
Length of output: 3451
Update documentation and deprecation notes to reference GenericDatumReader instead of non-existent DatumReader, and replace brittle doc link with intra-doc syntax.
The codebase does not define a DatumReader type. References in docs and deprecation notes should point to GenericDatumReader, which is the actual public type. Additionally, the documentation link at line 145 uses brittle struct.Reader.html syntax and should use the intra-doc link crate::Reader instead.
Affected locations:
- Line 37: Doc comment references
DatumReader - Line 145: Link uses
struct.Reader.html - Lines 147, 176, 208: Deprecation notes all say "Use
DatumReaderinstead"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/reader/datum.rs` at line 37, Update doc comments and deprecation
notes to reference the real public type GenericDatumReader and use intra-doc
links: replace the reference "DatumReader" in the doc comment that builds a
DatumReader with "GenericDatumReader" (symbol: GenericDatumReader), change the
brittle HTML link "struct.Reader.html" to an intra-doc link "crate::Reader"
(symbol: crate::Reader), and update all deprecation notes that say "Use
`DatumReader` instead" to "Use `GenericDatumReader` instead" (symbols:
deprecation blocks near the original lines referencing
DatumReader/GenericDatumReader) so docs point to the correct type and use stable
intra-doc syntax.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.
There was a problem hiding this comment.
Code Review
This pull request successfully refactors the from_avro_datum* functions into a new GenericDatumReader with a builder pattern, which improves the API. The changes are applied consistently across the codebase, including tests. I've found a few minor documentation typos and a logic bug in one of the refactored deprecated functions. Please see my comments for details.
Note: Security Review did not run due to the size of the PR.
| GenericDatumReader::builder(writer_schema) | ||
| .writer_schemata(writer_schemata)? | ||
| .maybe_reader_schema(reader_schema) | ||
| .reader_schemata(reader_schemata)? | ||
| .build()? | ||
| .read_value(reader) |
There was a problem hiding this comment.
This implementation of from_avro_datum_reader_schemata seems to have a logic issue. The call to .reader_schemata(reader_schemata)? is unconditional, but it requires reader_schema to have been set, which is not guaranteed as it's an Option. If reader_schema is None, maybe_reader_schema will not set it, and the subsequent call to reader_schemata will lead to a compile error due to the where S::ReaderSchema: generic_datum_reader_builder::IsSet bound.
The logic should be conditional, applying reader_schemata only when reader_schema is Some. The previous implementation also ignored reader_schemata when reader_schema was None.
| GenericDatumReader::builder(writer_schema) | |
| .writer_schemata(writer_schemata)? | |
| .maybe_reader_schema(reader_schema) | |
| .reader_schemata(reader_schemata)? | |
| .build()? | |
| .read_value(reader) | |
| let builder = GenericDatumReader::builder(writer_schema) | |
| .writer_schemata(writer_schemata)?; | |
| if let Some(rs) = reader_schema { | |
| builder | |
| .reader_schema(rs) | |
| .reader_schemata(reader_schemata)? | |
| .build()? | |
| .read_value(reader) | |
| } else { | |
| builder.build()?.read_value(reader) | |
| } |
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The builder API generated by "bon" crate seems to have a hole! By using the maybe_xyz() method the type system based checks might get confused that 'xyz' is set but actually if it is None then it won't be set. Prevents a bug that sets the reader_schemata when the reader_schema may not be set to a non-None value.
|
|
||
| #[bon] | ||
| impl<'s> GenericDatumReader<'s> { | ||
| /// Build a [`DatumReader`]. |
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.
| /// **NOTE** This function has a quite small niche of usage and does NOT take care of reading the | ||
| /// header and consecutive data blocks; use [`Reader`](struct.Reader.html) if you don't know what | ||
| /// you are doing, instead. | ||
| #[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")] |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.
| /// schemata to resolve any dependencies. | ||
| /// | ||
| /// When a reader `Schema` is provided, schema resolution will also be performed. | ||
| #[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")] |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.
| /// schemata to resolve any dependencies. | ||
| /// | ||
| /// When a reader `Schema` is provided, schema resolution will also be performed. | ||
| #[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")] |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.


496: To review by AI